All posts by Zeke

How not to calculate temperatures, part 3

My disagreement with Steven Goddard has focused on his methodology. His approach is quite simple: he just averages all the temperatures by year for each station, and then averages all the annual means together for all stations in each year.

I’ve been critical of this approach because I’ve argued that it can result in climatology-related biases when the composition of the station network changes. For example, if the decline in reporting stations post-1990 resulted in fewer stations from lower-latitude areas, it would introduce a cooling bias into the resulting temperature record that is unrelated to actual changing temperatures.

Goddard and NCDC methods raw adj

Lets take a look at how big a difference the choice of methods makes. The figure above shows Goddard’s method using the raw data in red, the correct method (gridded anomalies) using the raw data in blue, and the fully homogenized data (e.g. NCDC’s official temperature record) in green. Goddard’s method serves to exaggerate the effect of adjustments, though significant adjustment remain even when using a method unbiased by changes in underlying climatology. The majority of these adjustments are for changing time of observation at measurement stations, while much of the remainder is correcting for a max cooling bias due to the change from liquid in glass thermometers to MMTS thermometers. NCDC’s adjustments have been discussed at length elsewhere, so for the remainder of this post I’m going to focus on the factors that result in the difference between the red and blue lines. These differences may seem small, but they result in a non-negligible difference in the trend, and drive incorrect claims like Goddard’s assertion that the U.S. has been cooling since the 1930s and that most of the warming post-2000 is due to “fabricated data” from infilling.

Continue reading How not to calculate temperatures, part 3

How not to calculate temperatures, part 2

Unfortunately some folks who really should know better paid attention to the pseudonymous Steven Goddard, which spawned a whole slew of incorrect articles in places like the Telegraph, Washington Times, and Investor’s Business Daily about how the U.S. has been cooling since the 1930s. It even was the top headline on the Drudge Report for a good portion of the day. This isn’t even true in the raw data, and certainly not in the time of observation change-corrected or fully homogenized datasets.

As mentioned earlier, Goddard’s fundamental error is that he just averages absolute temperatures with no use of anomalies or spatial weighting. This is fine when station records are complete and well distributed; when the station network composition is changing over time or the stations are not well-distributed, however, it gives you a biased result as discussed at length earlier.

There is a very simple way to show that Goddard’s approach can produce bogus outcomes. Lets apply it to the entire world’s land area, instead of just the U.S. using GHCN monthly:

Averaged Absolutes

Egads! It appears that the world’s land has warmed 2C over the past century! Its worse than we thought!

Or we could use spatial weighting and anomalies:

 

Gridded Anomalies

Now, I wonder which of these is correct? Goddard keeps insisting that its the first, and evil anomalies just serve to manipulate the data to show warming. But so it goes.

Update

Anthony Watts asked for the code I used to generate these figures, something I should have included in the initial post.

The raw GHCN v3 data is available here.

The code for the first figure (averaged absolutes) is here (Excel version here).

The code for the second figure (gridded anomalies) is here.

The land mask used in calculating grid weights in the second figure is here.

Also, for reference, here is how my Figure 2 compares to the land records from all the other groups:

global land temp comps 1900-2014

How not to calculate temperature

The blogger Steven Goddard has been on a tear recently, castigating NCDC for making up “97% of warming since 1990” by infilling missing data with “fake data”. The reality is much more mundane, and the dramatic findings are nothing other than an artifact of Goddard’s flawed methodology. Lets look at what’s actually going on in more detail.

Whats up with Infilling?

The U.S. Historical Climatological Network (USHCN) was put together in the late 1980s, with 1218 stations chosen from a larger population of 7000-odd cooperative network stations based on their long continuous records and geographical distribution. The group’s composition has been left largely unchanged, though since the late 1980s a number of stations have closed or stopped reporting. Much of this is due to the nature of the instruments; many USHCN stations are manned by volunteers (and are not automated instruments), and these volunteers may quit or pass away over the decades. Since the 1980s the number of reporting USHCN stations has slowly declined from 1218 in the 1980s to closer to 900 today, as shown in the figure below. As an aside, this is quite similar to what happened with GHCN, which birthed the frustratingly-persistent “march of the thermometers” meme. Unsurprisingly, the flaw in Goddard’s analysis mirrors that of E.M. Smith’s similar claims regarding GHCN.

USHCN Raw Adj Count

As part of its adjustment process, USHCN infills missing stations based on a spatially-weighted average of surrounding station anomalies (plus the long-term climatology of that location) to generate absolute temperatures. This is done as a final step after TOBs adjustments and pairwise homogenization, and results in 1218 records every month. The process is really somewhat unnecessary, as it simply mirrors the effect of spatial interpolation (e.g. gridding or something more complex), but I’m told that its a bit of an artifact to help folks more easily calculate absolute temperatures without having to do something fancy like add in a long-term climatology field to a spatially-interpolated anomaly field. Regardless, it has relatively little effect on the results [note that 2014 shows only the first four months]:

USHCN infilled noninfilled

 

You can’t really tell the difference between the two visually. I’ve plotted the difference below, with a greatly truncated scale:

USHCN infilling effect

 

Where did Goddard go wrong?

Goddard made two major errors in his analysis, which produced results showing a large bias due to infilling that doesn’t really exist. First, he is simply averaging absolute temperatures rather than using anomalies. Absolute temperatures work fine if and only if the composition of the station network remains unchanged over time. If the composition does change, you will often find that stations dropping out will result in climatological biases in the network due to differences in elevation and average temperatures that don’t necessarily reflect any real information on month-to-month or year-to-year variability. Lucia covered this well a few years back with a toy model, so I’d suggest people who are still confused about the subject to consult her spherical cow.

His second error is to not use any form of spatial weighting (e.g. gridding) when combining station records. While the USHCN network is fairly well distributed across the U.S., its not perfectly so, and some areas of the country have considerably more stations than others. Not gridding also can exacerbate the effect of station drop-out when the stations that drop out are not randomly distributed.

The way that NCDC, GISS, Hadley, myself, Nick Stokes, Chad, Tamino, Jeff Id/Roman M, and even Anthony Watts (in Fall et al) all calculate temperatures is by taking station data, translating it into anomalies by subtracting the long-term average for each month from each station (e.g. the 1961-1990 mean), assigning each station to a grid cell, averaging the anomalies of all stations in each gridcell for each month, and averaging all gridcells each month weighted by their respective land area. The details differ a bit between each group/person, but they produce largely the same results.

Lets take a quick look at USHCN raw data to see how big a difference gridding and anomalies make. I went ahead and wrote up some quick code that easily allows me to toggle which dataset is used (raw or adjusted), whether anomalies or absolutes are used, whether the data is gridded or not, and whether infilled data is used or not (in the case of adjusted; raw USHCN data has no infilling). Its available here, for anyone who is interested (though note that its in STATA).

Here is what we get if we take USHCN raw data and compare the standard approach (gridded anomalies) to Goddard’s approach (averaged absolutes)  [note that 2014 is omitted from all absolute vs anomaly comparisons for obvious reasons]:

USHCN gridded anomalies averaged absolutes

This compares absolutes to anomalies by re-baselining both to 1961-1990 after the annual CONUS series have been calculated. The differences stand out much more starkly in the difference series below:

USHCN gridded anomalies minus averaged absolutes

This difference is largely due to the changing composition of  stations in the network over time. Interestingly, simply spatially gridding absolute temperatures eliminates much of the difference, presumably because the other stations within the grid cell have similar climatologies and thus avoid skewing national reconstructions.

USHCN gridded anomalies gridded absolutes

The difference series is correspondingly much smaller:

USHCN gridded anomalies minus gridded absolutes Here the differences are pretty minimal. If Goddard is adverse to anomalies, a simple spatial gridding would eliminate most of the problem (I’m using USHCN’s standard of 2.5×3.5 lat/lon grid cells, though the 5×5 that Hadley uses would work as well).

So what is the impact of infilling?

Lets do a quick exercise to look at the impact of infilling using four different approaches: the standard gridded anomaly approach, an averaged (non-gridded) anomaly approach, a gridded absolute approach, and Goddard’s averaged absolute approach:

USHCN infill difference comparison

Goddard’s approach is the only one that shows a large warming bias in recent years, though all absolute approaches unsurprisingly show a larger effect of infilling due to the changing station composition of the non-infilled data. We also have a very good reason to think that there has not been a large warming bias in USHCN in the last decade or so. The new ideally-sited U.S. Climate Reference Network (USCRN) agrees with USHCN almost perfectly since it achieved nationwide spatial coverage in 2005 (if anything, USCRN is running slightly warmer in recent months):

Screen Shot 2014-06-05 at 1.25.23 PM

Update

Another line of evidence suggesting that the changing composition is not biasing the record comes from Berkeley Earth. Their U.S. temperature record has more stations in the current year than in any prior year (e.g. close to 10,000) and agrees quite well with NCDC’s USHCN record.

Update #2

The commenter geezer117 suggested a simple test of the effects of infilling: compare each infilled station’s value to its non-infilled neighbors. My code can be easily tweaked to allow this by comparing a reconstruction based on only infilled stations to a reconstruction based on no infilled stations, only using grid cells that have both infilled and non-infilled stations (to ensure we are comparing areas of similar spatial coverage.

The results are unsurprising: the two sets are very similar (infilled stations are actually warming slightly less quickly than non-infilled stations since 1990, though they are not significantly different within the uncertainties due to methodological choices). This is because infilling is done by using a spatial weighted average of nearby station anomalies (plus the average climatology of the missing station to get absolute temperatures).

 

USHCN infilled vs non infilled

UHI Paper Finally Published In JGR!

Its been a long road since I wrote a blog post titled UHI in the USA back in 2010, with pitstops at a number of conferences, one internal round of peer review at NOAA and two with the journal, and many helpful comments from folks, but our paper finally is out in the Journal of Geophysical Research (JGR). You can find a non-paywall version here.

All our code and data is available on the NCDC FTP site, and I encourage folks to do their own analysis using it as a starting point if interested. We also have a guest post up at RealClimate that goes into some detail regarding the methods and results.

Correlations of Anomalies over Distance

The subject of the correlation of temperature anomalies over distance came up in the NCDC thread, and I figured it would be a good excuse to run an analysis that I’ve been meaning to do for awhile. This is not particularly novel; indeed, it was done as far back as Hansen and Lebedeff in 1987 (and probably before). But it is instructive to examine nonetheless.

To start with I used the USHCN v2.5 data from the NCDC’s FTP site. I converted the observations into anomalies relative to a 1971-2000 baseline  (which, in looking at correlation over time is not really needed, but I was doing another analysis at the time as well and it won’t affect the results). I next created all possible combinations of the 1218 USHCN stations, yielding northwards of 700,000 unique pairs. I calculated both the distance between each pair (based on the lat/lon) and the correlation between the anomalies for the 1895-2012 period, and did a simple scatter plot of one against the other:

CONUS monthly correlations scatter

(Click to embiggen)

Continue reading Correlations of Anomalies over Distance

A Defense of the NCDC, and of Basic Civility

There is a cancer growing in the climate blogging world. It is a cancer of bad faith, a default assumption that the other side must be lying, stupid, or in the pay of someone nefarious. It manifests itself in one-sided discourses, and personal attacks, and in the blind rejection of results that do not conform to a specific world view.

I tend to be hard to rile up by nature, but a recent article on Fox News was too egregious to be ignored. It was not the criticism of data adjustments that was the problem (though this was somewhat unfounded, as I will discuss later), but rather the remark at the end attributed to Anthony Watts. He said:

 Is history malleable? Can temperature data of the past be molded to fit a purpose? It certainly seems to be the case here, where the temperature for July 1936 reported … changes with the moment. In the business and trading world, people go to jail for such manipulations of data.

I’m sorry, but accusing people that you disagree with of fraud, and even suggesting that they go to jail, is simply beyond the pale. Not only does it stymie any possibility of constructive scientific discourse; it is also blatantly unethical. Fraud should only be alleged in extreme cases when there is strong evidence supporting it, not simply because the results don’t match your preconceptions. If you disagree with someone’s approach and methods, the proper way to respond is to create your own approach and demonstrate that it is superior. That is the way science moves forward. To descend into personal attacks, to politicize the science, is deeply irresponsible.

This type of discourse also creates immense distractions for the scientists involved. Many of the folks at the NCDC have spent a significant portion of their time over the past three years dealing with two different GSA investigations, various congressional hearings, and the need to respond to media furors like the Fox News story. This is not to say that we cannot be skeptical of the results of scientists like those at NCDC, but rather that the correct approach to that skepticism is handled through scientific arguments rather than political or media attacks.

In the spirit of civility, I would ask Anthony to retract his remarks. He may well disagree with NCDC’s approach and results, but accusing them of fraud is one step too far. Given all the steps the scientists at NCDC have taken to publish their data and code, to make their papers accessible without a pay-wall, and to work with external groups to evaluate and verify their findings, there is no reasonable justification for the allegation of fraudulent behavior, and certainly not to suggest the scientific work they are doing is a jailable offense.

Continue reading A Defense of the NCDC, and of Basic Civility

UHI in California

I’m rather partial to California, having lived here for the last two years. I’ve also done a lot of work on UHI issues. So I was rather interested in Anthony’s post today, and figured a modernization effort (using 1 km resolution remote sensing products) was in order.

Given 54 USHCN stations in California and the following urbanity proxies:

  • GRUMP (Urban or Rural binary station designations by the Global Rural Urban Mapping Project – Columbia University)
  • Nightlights (Night brightness of km^2 pixel, 30 cutoff used; NOAA)
  • ISA (Percentage of Km^2 gridded areas covered by Constructed Impervious Surfaces, pavement, buildings, concrete, etc., 10% cutoff used; NOAA)
  • Population Growth (Population growth per square kilometer, 1930-2000, cutoff 10 people used; NOAA)

We end up with these buckets:

These trends from 1895-2012:

Fig 1.   Circles are raw data, diamonds are TOBs-adjusted data, and triangles are fully adjusted (F52) data. Solid shapes are urban stations and hollow shapes are rural stations.

And these trends from 1960-2012:

Fig 2.   Circles are raw data, diamonds are TOBs-adjusted data, and triangles are fully adjusted (F52) data. Solid shapes are urban stations and hollow shapes are rural stations.

Interesting. I could probably reduce the CIs a bit if I looked at the trend of the difference rather than the difference of the trends.
.
Update
Here is the metadata for the CA USHCN stations: metadata
Raw data by proxy/urbanity is here
Tobs data by proxy/urbanity is here
Fully adjusted data by proxy/urbanity is here

On Volcanoes and their Climate Response

By Steven Mosher and Robert Rohde

One of the more interesting findings of the Berkeley Earth papers centers around volcanos and the climate reaction them. The proposition that volcanic eruptions, when conditions are right, cool the temperature of the globe is accepted by everyone who understands physics. Nevertheless, there are someways of looking at this problem where you can fool yourself and others. Start here with some back of the envelop glance at the problem. On the surface it seems to make sense. Volcano’s cool; line up the charts; and it should be clear. The approach fails, however, to take notice of the fact that volcanos appear randomly. If the climate was experiencing a warm year, say 1C above normal, and a volcano cooled the planet by .5C, it would be lost in the noise. It would be invisible to someone who didn’t want to find it. To be fair, Willis tried a second method, looking at the average of the two years prior and the average of the two years after a volcano. This too is a weak method of signal detection. Below, I’ll illustrate the problem with a little toy example, and then tell you a couple ways to find the volcano.

Continue reading On Volcanoes and their Climate Response

Initial thoughts on the Watts et al draft

While doing a detailed analysis of the results is not possible until the actual station siting classifications are released, I’ll provide an initial set of thoughts and discuss potential areas where the paper could be improved.

As everyone who follows climate blogs is well aware by now, Anthony Watts released a draft of his new paper yesterday, which claims that well sited stations have a mean warming trend in the raw data that is about 0.1 C less per decade (0.155C / decade) than poorly sited stations (0.248 C per decade).This is a significantly different result from prior papers that have looked at the issue of station siting (e.g. Menne et al 2010, Fall et al 2011, Muller et al), which all found no significant differences in mean temperature trends between well and poor sited stations.

The Watts et al draft differs from prior papers in that it uses the classifications scheme from Leroy 2010 rather than the older Leroy 1999 criteria. The difference between the old and new Leroy papers is the inclusion of total surface area of heat sinks, rather than simple distance to heat sinks. This actually results in a less strict criteria than that of Leroy 1999, and considerably more stations are rated as class 1 or 2 (160) than in the prior classification scheme (~80).

This should by itself raise a yellow flag: if using a more strict classification criteria found no difference in trend, why would a more lax classification criteria result in very significant differences in trends (at least in the raw data)? Intuitively the opposite should be true; if good station siting is correlated with lower trends, than more restrictive groups of good stations should result in lower trends, all things being equal.

The Watts et al draft focuses the majority of their analysis on the raw data, comparing the results they get to those from the adjusted data to suggest that the adjustments are biasing the trend upwards. This is different from Fall et al, which examined raw, time of observation adjusted, and fully adjusted data but primarily used fully adjusted data in their analysis.

There are good reasons not to use the raw USHCN data for analysis, at least without additional work to control for potential biases. During the period that Watts examines, from 1979-2008, a significant portion of USHCN stations were converted from Liquid-in-Glass (LiG) thermometers in Cotton Region Shelters (CRS) to electronic instruments in Maximum-Minimum Temperature Systems (MMTS). These instrument changes also generally involved the instrument location changing, as MMTS sensors require being wired to an electric source. There is good evidence that the conversion to MMTS stations introduced a significant negative bias in max temperatures and a modest negative bias in mean temperatures, as shown in the figure below. By looking at 1979-2008 trends in stations whose current equipment is MMTS, Watts incorrectly concludes that MMTS stations have a lower trend, because in effect he is looking at a record that from 1979 to around 1990 was mostly CRS and 1990 to 2008 was mostly MMTS. The correct approach would be to examine records from MMTS stations only after the date at which the MMTS instrument was installed.

Additionally, during the period from 1979 to 2010 about 250 USHCN stations changed their time of observation (TOBs) from near sunset to early morning. This change in TOBs results in a significant step change in temperature measurements for stations effected, biasing trend calculations over the period if it is not corrected for. These TOBs changes may be more likely to occur in rural stations than urban stations, as many urban stations had earlier transitions to MMTS instruments with automated readings.

Watts et al find that there are no significant differences in trends in well sited and poorly sited stations in the adjusted data, but significant differences in the raw data. Given that the two major biases known to have occurred during this period, he should be extremely careful to control for sensor type and observation time between his two groups. It may well be that the observed differences are driven more by these factors than actual station siting, given that prior station siting analysis using a more strict classification criteria found no significant differences.

Another useful analysis for Watts et al to do would be to compare Class 1/2 and Class 3/4/5 station records to those from the U.S. Climate Reference Network (CRN), a set of pristinely sited stations, during the period from 2004 to 2012 where CRN data is available. While the period is relatively short, it may be sufficient to yield interesting results.

—————————————————————————————–
Its also worth highlighting Kenneth Fritsch’s summary from the prior thread, which covers some similar points and makes a number of additional ones:

After a 300 plus post thread I think it would be time for someone to attempt to summarize the importance and weaknesses of the Watts prepublication. I have not read all 300 plus posts and I have only skimmed through the Watts paper. I am familiar with this subject so skimming can impart a fairly accurate picture.

What I see is with the newer criteria for classifying stations the raw USHCN temperature data shows larger differences between 1979-2008 trends when station are grouped by the newer criteria. That finding if it can be verified would be worth a publication. For the bigger picture, however, the question has to be whether that difference exists in the adjusted data. It does not, but Watt attempts to make the point that the data from the higher quality stations with the lower trends is adjusted upwards towards the poorer quality stations with the higher trends. That is an occurrence I have seen with my own calculations using the older classification criteria with TOB and adjusted data. Here I do not see Watts making the connection between that observation and an error in the adjustment methodology, but perhaps my skimming of the article was not sufficient to find it.

Firstly as I recall the USHCN data adjustments start with the TOB data and not the raw data and secondly I recall the TOB is a major part of the adjustment from raw to adjusted temperatures. The adjustments between TOB and Adjusted data are made for USHCN with the change point algorithm which is invoked from both meta data and measured change point calculations between nearest neighbors stations. Once the non homogeneity is established in this manner the data is adjusted using the nearest neighbor stations. It is at this point that Watts would have to show the over homogenization of the algorithm that would lead to the error he claims for the USHCN adjustment process. What I have found by calculating and finding change points between nearest neighbors after the USHCN data has been adjusted with the Menne algorithm is that it is under adjusted – although these observation could arise from separate problems.

The problem with using a current station rating to predict temperature trend effects is that it says nothing about the past history of the quality of that station and when you only look at the past 30 years of data a station change that predates that period and remained more of less constant should not affect a temperature trend. Obviously in principle a change point can look into the past and would be better equipped to find these changes. Unfortunately those change point algorithms are limited and particularly so by noisy data. Attempts have been made to use simulated and realistic data to test or benchmark the performances of these algorithms. It might be instructive to determine how one might realistically simulate a changing station criterion for testing the adjustment algorithm. For example would a slow decay in station quality (rating) be detected with a change point and without meta data?